Goto

Collaborating Authors

 ai workload


Beyond Connectivity: An Open Architecture for AI-RAN Convergence in 6G

Polese, Michele, Mohamadi, Niloofar, D'Oro, Salvatore, Bonati, Leonardo, Melodia, Tommaso

arXiv.org Artificial Intelligence

Abstract--Data-intensive Artificial Intelligence (AI) applications at the network edge demand a fundamental shift in Radio Access Network (RAN) design, from merely consuming AI for network optimization, to actively enabling distributed AI workloads. This presents a significant opportunity for network operators to monetize AI while leveraging existing infrastructure. T o realize this vision, this article presents a novel converged O-RAN and AI-RAN architecture for unified orchestration and management of telecommunications and AI workloads on shared infrastructure. The proposed architecture extends the Open RAN principles of modularity, disaggregation, and cloud-nativeness to support heterogeneous AI deployments. We introduce two key architectural innovations: (i) the AI-RAN Orchestrator, which extends the O-RAN Service Management and Orchestration (SMO) to enable integrated resource and allocation across RAN and AI workloads; and (ii) AI-RAN sites that provide distributed edge AI platforms with real-time processing capabilities. The proposed architecture enables flexible orchestration, meeting requirements for managing heterogeneous workloads at different time scales while maintaining open, standardized interfaces and multi-vendor interoperability.This paper has been submitted to IEEE for publication. M. Polese, L. Bonati, and T. Melodia are with the Institute for the Wireless Internet of Things, Northeastern University, Boston, MA, USA. This article is based upon work partially supported by the NTIA PWSCIF under A ward No. 25-60-IF054, the U.S. NSF under award CNS-2112471, and by OUSD(R&E) through Army Research Laboratory Cooperative Agreement Number W911NF-24-2-0065.


Tensor Program Optimization for the RISC-V Vector Extension Using Probabilistic Programs

Peccia, Federico Nicolas, Haxel, Frederik, Bringmann, Oliver

arXiv.org Artificial Intelligence

RISC-V provides a flexible and scalable platform for applications ranging from embedded devices to high-performance computing clusters. Particularly, its RISC-V Vector Extension (RVV) becomes of interest for the acceleration of AI workloads. But writing software that efficiently utilizes the vector units of RISC-V CPUs without expert knowledge requires the programmer to rely on the autovectorization features of compilers or hand-crafted libraries like muRISCV-NN. Smarter approaches, like autotuning frameworks, have been missing the integration with the RISC-V RVV extension, thus heavily limiting the efficient deployment of complex AI workloads. In this paper, we present a workflow based on the TVM compiler to efficiently map AI workloads onto RISC-V vector units. Instead of relying on hand-crafted libraries, we integrated the RVV extension into TVM's MetaSchedule framework, a probabilistic program framework for tensor operation tuning. We implemented different RISC-V SoCs on an FPGA and tuned a wide range of AI workloads on them. We found that our proposal shows a mean improvement of 46% in execution latency when compared against the autovectorization feature of GCC, and 29% against muRISCV-NN. Moreover, the binary resulting from our proposal has a smaller code memory footprint, making it more suitable for embedded devices. Finally, we also evaluated our solution on a commercially available RISC-V SoC implementing the RVV 1.0 Vector Extension and found our solution is able to find mappings that are 35% faster on average than the ones proposed by LLVM. We open-sourced our proposal for the community to expand it to target other RISC-V extensions.


Load Balancing for AI Training Workloads

McClure, Sarah, Ratnasamy, Sylvia, Shenker, Scott

arXiv.org Artificial Intelligence

We investigate the performance of various load balancing algorithms for large-scale AI training workloads that are running on dedicated infrastructure. The performance of load balancing depends on both the congestion control and loss recovery algorithms, so our evaluation also sheds light on the appropriate choices for those designs as well.


Turning AI Data Centers into Grid-Interactive Assets: Results from a Field Demonstration in Phoenix, Arizona

Colangelo, Philip, Coskun, Ayse K., Megrue, Jack, Roberts, Ciaran, Sengupta, Shayan, Sivaram, Varun, Tiao, Ethan, Vijaykar, Aroon, Williams, Chris, Wilson, Daniel C., MacFarland, Zack, Dreiling, Daniel, Morey, Nathan, Ratnayake, Anuja, Vairamohan, Baskar

arXiv.org Artificial Intelligence

Artificial intelligence (AI) is fueling exponential electricity demand growth, threatening grid reliability, raising prices for communities paying for new energy infrastructure, and stunting AI innovation as data centers wait for interconnection to constrained grids. This paper presents the first field demonstration, in collaboration with major corporate partners, of a software-only approach--Emerald Conductor--that transforms AI data centers into flexible grid resources that can efficiently and immediately harness existing power systems without massive infrastructure buildout. Conducted at a 256-GPU cluster running representative AI workloads within a commercial, hyperscale cloud data center in Phoenix, Arizona, the trial achieved a 25% reduction in cluster power usage for three hours during peak grid events while maintaining AI quality of service (QoS) guarantees. By orchestrating AI workloads based on real-time grid signals without hardware modifications or energy storage, this platform reimagines data centers as grid-interactive assets that enhance grid reliability, advance affordability, and accelerate AI's development.


Efficient Unified Caching for Accelerating Heterogeneous AI Workloads

Wang, Tianze, Liu, Yifei, Chen, Chen, Zuo, Pengfei, Zhang, Jiawei, Weng, Qizhen, Chen, Yin, Han, Zhenhua, Zhao, Jieru, Chen, Quan, Guo, Minyi

arXiv.org Artificial Intelligence

Modern AI clusters, which host diverse workloads like data pre-processing, training and inference, often store the large-volume data in cloud storage and employ caching frameworks to facilitate remote data access. To avoid code-intrusion complexity and minimize cache space wastage, it is desirable to maintain a unified cache shared by all the workloads. However, existing cache management strategies, designed for specific workloads, struggle to handle the heterogeneous AI workloads in a cluster -- which usually exhibit heterogeneous access patterns and item storage granularities. In this paper, we propose IGTCache, a unified, high-efficacy cache for modern AI clusters. IGTCache leverages a hierarchical access abstraction, AccessStreamTree, to organize the recent data accesses in a tree structure, facilitating access pattern detection at various granularities. Using this abstraction, IGTCache applies hypothesis testing to categorize data access patterns as sequential, random, or skewed. Based on these detected access patterns and granularities, IGTCache tailors optimal cache management strategies including prefetching, eviction, and space allocation accordingly. Experimental results show that IGTCache increases the cache hit ratio by 55.6% over state-of-the-art caching frameworks, reducing the overall job completion time by 52.2%.


POLARON: Precision-aware On-device Learning and Adaptive Runtime-cONfigurable AI acceleration

Lokhande, Mukul, Vishvakarma, Santosh Kumar

arXiv.org Artificial Intelligence

--The increasing complexity of AI models requires flexible hardware capable of supporting diverse precision formats, particularly for energy-constrained edge platforms. This work presents PARV-CE, a SIMD-enabled, multi-precision MAC engine that performs efficient multiply-accumulate operations using a unified data-path for 4/8/16-bit fixed-point, floating-point, and posit formats. The architecture incorporates a layer-adaptive precision strategy to align computational accuracy with workload sensitivity, optimizing both performance and energy usage. The results demonstrate up to 2 improvement in PDP and 3 reduction in resource usage compared to SoT A designs, while retaining accuracy within 1.8% FP32 baseline. The architecture supports both on-device training and inference across a range of workloads, including DNNs, RNNs, RL, and Transformer models. The empirical analysis establish PARV-CE incorporated POLARON as a scalable and energy-efficient solution for precision-adaptive AI acceleration at edge.


ADA: Automated Moving Target Defense for AI Workloads via Ephemeral Infrastructure-Native Rotation in Kubernetes

Sheriff, Akram, Huang, Ken, Nemeth, Zsolt, Nakhjiri, Madjid

arXiv.org Artificial Intelligence

This paper introduces the Adaptive Defense Agent (ADA), an innovative Automated Moving Target Defense (AMTD) system designed to fundamentally enhance the security posture of AI workloads. ADA operates by continuously and automatically rotating these workloads at the infrastructure level, leveraging the inherent ephemerality of Kubernetes pods. This constant managed churn systematically invalidates attacker assumptions and disrupts potential kill chains by regularly destroying and respawning AI service instances. This methodology, applying principles of chaos engineering as a continuous, proactive defense, offers a paradigm shift from traditional static defenses that rely on complex and expensive confidential or trusted computing solutions to secure the underlying compute platforms, while at the same time agnostically supporting the latest advancements in agentic and nonagentic AI ecosystems and solutions such as agent-to-agent (A2A) communication frameworks or model context protocols (MCP). This AI-native infrastructure design, relying on the widely proliferated cloud-native Kubernetes technologies, facilitates easier deployment, simplifies maintenance through an inherent zero trust posture achieved by rotation, and promotes faster adoption. We posit that ADA's novel approach to AMTD provides a more robust, agile, and operationally efficient zero-trust model for AI services, achieving security through proactive environmental manipulation rather than reactive patching.


Arm's 2025 CPU plans include a big push in PC performance

PCWorld

You would think that Arm, which arguably has been the vanguard in the smartphone and PC industry push for improved power efficiency, would double down on that strategy in its plans for 2025. PCWorld sat down at CES 2025 with Chris Bergey, senior vice president and general manager for Arm's client line of business. Bergey is responsible for both the smartphone as well as the laptop and tablet business, where Arm's designs are licensed by companies like Qualcomm and Apple, who tweak and eventually manufacture them as finished goods. Arm provides multiple types of licenses, but the two most common types are a core license, where a customer will buy a verified core that includes an Arm Cortex CPU, Mali GPU, or other intellectual property. Arm also sells architectural licenses to companies like Apple, which gives them the freedom to design their own cores from scratch, though they must be fully compatible with the Arm architecture.


Good things come in small packages: Should we adopt Lite-GPUs in AI infrastructure?

Canakci, Burcu, Liu, Junyi, Wu, Xingbo, Cheriere, Nathanaël, Costa, Paolo, Legtchenko, Sergey, Narayanan, Dushyanth, Rowstron, Ant

arXiv.org Artificial Intelligence

To match the blooming demand of generative AI workloads, GPU designers have so far been trying to pack more and more compute and memory into single complex and expensive packages. However, there is growing uncertainty about the scalability of individual GPUs and thus AI clusters, as state-of-the-art GPUs are already displaying packaging, yield, and cooling limitations. We propose to rethink the design and scaling of AI clusters through efficiently-connected large clusters of Lite-GPUs, GPUs with single, small dies and a fraction of the capabilities of larger GPUs. We think recent advances in co-packaged optics can be key in overcoming the communication challenges of distributing AI workloads onto more Lite-GPUs. In this paper, we present the key benefits of Lite-GPUs on manufacturing cost, blast radius, yield, and power efficiency; and discuss systems opportunities and challenges around resource, workload, memory, and network management.


AI-RAN: Transforming RAN with AI-driven Computing Infrastructure

Kundu, Lopamudra, Lin, Xingqin, Gadiyar, Rajesh, Lacasse, Jean-Francois, Chowdhury, Shuvo

arXiv.org Artificial Intelligence

The radio access network (RAN) landscape is undergoing a transformative shift from traditional, communication-centric infrastructures towards converged compute-communication platforms. This article introduces AI-RAN which integrates both RAN and artificial intelligence (AI) workloads on the same infrastructure. By doing so, AI-RAN not only meets the performance demands of future networks but also improves asset utilization. We begin by examining how RANs have evolved beyond mobile broadband towards AI-RAN and articulating manifestations of AI-RAN into three forms: AI-for-RAN, AI-on-RAN, and AI-and-RAN. Next, we identify the key requirements and enablers for the convergence of communication and computing in AI-RAN. We then provide a reference architecture for advancing AI-RAN from concept to practice. To illustrate the practical potential of AI-RAN, we present a proof-of-concept that concurrently processes RAN and AI workloads utilizing NVIDIA Grace-Hopper GH200 servers. Finally, we conclude the article by outlining future work directions to guide further developments of AI-RAN.